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MULTIVARIABLE  AND  MULTIGROUP  RECEIVER  OPERATING  CHARACTERISTICS 
CURVE  ANALYSES  FOR  QUALITATIVE  AND  QUANTITATIVE  ANALYSIS 


1 .  INTRODUCTION 

The  extensive  amount  of  data  generated  from  a  set  of  experiments  can  seem  daunting 
as  one  approaches  the  sample  determination  and  interpretation  phases.  It  is  desirable  to  extract  as 
much  qualitative  and  quantitative  information  as  possible  from  an  experimental  analysis. 

Usually,  replicate  analyses  are  mandatory  for  consideration  of  the  error  inputs  associated  with  an 
experimental  set  of  data.  Experimental  designs  that,  for  example,  rely  on  spectroscopy, 
spectrometry,  or  chemical  shift  responses  can  easily  generate  an  overwhelming  amount  of 
information  for  a  typical  experiment.  When  experimental  data  from  replicate  analyses  are 
combined  with  a  suite  of  different  sample  groups  for  comparison  purposes,  visual  interpretations 
yield  relatively  poor  conclusions  for  decision-making  purposes. 

During  the  data  analysis  and  interpretation  phases,  it  is  important  to  extract  as  much 
information  as  possible.  Presentation  of  the  raw  data  into  a  finished  product  requires  the  use  of 
careful,  well-thought-out  statistical  procedures.  Thus,  the  data  reduction  phase  is  an  important 
step  in  data  analysis  that  provides  a  critical  bridge  between  the  raw  data  and  the  interpretation 
and  decision-making  processes. 

Multivariate  analysis  is  an  attractive  data  reduction  technique  that  is  used  to  convert 
an  extensive  amount  of  experimental  sample  data  into  a  highly  reduced  set  of  qualitative,  visual 
information.  A  typical  data  record  can  contain  many  hundreds  to  thousands  of  variables  such  as 
wavelength,  wavenumber,  mass  to  charge  ratio  ( mlz ),  retention  time,  or  chemical  shift.  By 
converting  experimental  records  into  points,  a  visual  two-  or  three-dimensional  plot  can  be 
obtained  that  consists  of  a  dispersion  of  points  in  which  each  point  is  a  complete  experimental 
record.  *  Depending  on  the  data  set  and  type  of  input,  the  principal  component,  discriminant, 
canonical  variate,  and  dendrogram  data  reduction  analyses  can  be  implemented.  The  data  records 
are  reduced  using  the  principal  of  a  linear  combination  of  variables.  In  general,  most  data  sets  do 
not  follow  or  exhibit  linear  behavior.  Nevertheless,  multivariate  data  analysis  is  a  widely  used 
technique  for  mathematically  forcing  nonlinear  data  sets  into  a  linear  model.  This  can  cause 
distortion  of  the  data  set  during  the  data  reduction  analysis  phase.  In  addition,  condensing  many 
hundreds  of  variables  (dimensions)  that  are  resident  in  a  multivariate-dimensional  data  space  into 
a  two-  or  three-dimensional  plot  inherently  produces  a  distorted  view  of  the  relative  positioning 
of  the  experimental  data  points  in  the  original  multidimensional  data  space. 

Multivariate  data  analysis  provides  qualitative  accounting  for  a  set  of  experiments 
including  interpretation  and  decision-making.  A  database  can  be  constructed  from  a  known  set  of 
substances  to  characterize  and  possibly  identify  an  unknown  or  suspect  sample  with  respect  to  its 
presence  in  the  database.  These  are  practical  and  important  objectives  for  qualitative  data 
analysis.1  4 


Another  objective  in  data  interpretation  and  decision-making  concerns  the 
quantitative  information  component.  Characterization  and  identification  of  the  data  set  with 
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specific  examples  are  important  tasks,  but  the  reliability,  sensitivity,  and  specificity  of  the 
technique  are  just  as  important.  The  decision-making  process  relies  on  these  figures  of  merit. 

Receiver  operating  characteristics  (ROC)  curve  analysis  was  developed  in  the  field  of 
statistical  decision  theory  and  was  broadened  in  the  1 950s  to  the  field  of  signal  detection  theory 
as  a  means  of  enabling  radar  operators  to  distinguish  between  enemy  targets,  friendly  forces,  and 
noise.5’’'  ROC  curves  report  on  the  quantitative  aspects  of  a  data  set  in  an  analysis.7  10  Replicate 
data  are  essential  for  an  ROC  analysis,  because  the  replicate  information  contains  various 
sources  of  error  that  inherently  affect  the  analysis  reliability  and  experimental  error.  These 
figures  of  merit  factor  into  the  decision-making  process  in  a  plot  of  sensitivity  [true  positive 
(TP)]  on  the  ordinate  and  selectivity  or  specificity  (1  -  TN  =  FP)  on  the  abscissa,  where  TN  is 
true  negative  and  FP  is  false  positive.  TP  and  TN  characterize  the  data  set  and  the  reliability  of 
decisions  that  are  made  on  the  experimental  data  set.  An  important  parameter  in  ROC  curve 
analysis  is  the  area  under  the  curve  (AUC).  The  AUC  is  a  separation  measurement’ 1  15  between 
two  groups  of  interest  (positive/negative,  healthy/diseased,  go/no  go,  control/test  subjects, 
presen t/absent.  Group  A/Group  B,  or  green/red.  to  name  a  few). 

The  AUC  can  also  be  considered  as  a  measure  of  how  well  a  variable  can  distinguish 
between  two  diagnostic  groups.  The  AUC  directly  translates  into  the  TP  and  false  negative  (FN) 
parameters,  which  are  fundamental  to  decisions  or  conclusions  from  the  data  set  observations 
and  responses,  including  sample  discrimination  and/or  future  directions  of  analysis.  Each  point 
on  an  ROC  curve  represents  a  sensitivity/specificity  pair  corresponding  to  a  particular  decision 
threshold.  A  cutoff  or  threshold  value  is  merely  a  perpendicular  line  drawn  on  a  standard 
frequency  plot  of  tw  o  distributions.1*  A  series  of  perpendicular  lines  are  drawn  throughout  the 
two  frequency  distributions,  and  the  lines  represent  sensitivity  and  selectivity  pairs  of  points  to 
characterize  the  control  and  sample  distributions  of  data. 

An  ROC  curv  e  analysis  retains  the  integrity  of  the  data  throughout  the  analysis.  That 
is,  unlike  multivariate  data  analysis,  a  linear  or  nonlinear  data  set  retains  its  integrity  in  the  ROC 
analysis.  When  several  ROC  curves  are  compared,  the  AUC  is  usually  the  best  discriminator.1 1  15 
The  AUC  is  calculated  by  the  extended  trapezoidal  rule.11 14  ROC  curve  analysis  is  a  univariate 
technique,  and  multivariate  data  analysis  typically  uses  many  variables  (variates);  hence,  the 
term  multivariate  analysis. 

Scurfield1  offered  an  extensive  suite  of  experiments  for  arriving  at  an  analysis  of  the 
volume  under  the  surface  (two  variables  and  their  frequency)  or  hypervolume  under  the  volume 
(manifold,  using  three  variables  and  their  frequency).  Hundreds  of  experiments  were  required  to 
delineate  the  boundaries  and  internal  data  space  of  the  volumes  to  produce  an  overall  accounting 
of  the  response.  Li  and  Fine1”  took  the  entire  data  set  of  Scurfield1  and  used  bootstrap  inference 
probability  estimation  methods  to  statistically  reduce  the  data  set  for  a  more  manageable  data 
analysis  algorithm.  Yiannoutsos  et  al.19  used  biochemical  procedures  that  provided  three  classes 
of  medical  outcomes  for  human  immunodeficiency  virus  patients.  The  three  classes  were 
partitioned  over  the  same  single  variable,  and  each  partition  region  was  treated  separately.  A 
volume  was  calculated  by  plotting  the  ( x ,  y,  z)  coordinates  of  the  frequency  probabilities  from  the 
response  distribution  of  the  one  variable  for  the  three  classes.10  The  volume  under  the  surface  of 
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the  responses  (probability)  was  deemed  equivalent  to  the  AUC  of  a  typical  ROC  curve.  This 
analysis  is  unique  to  three  classes  being  partitioned  by  one  variable. 

We  have  developed  an  algorithm  that  applies  ROC  curve  analysis  on  the  intensity 
distribution  for  each  variable  in  an  experimental  record  over  an  entire  data  set  of  experiments.  In 
Part  1  of  the  analysis,  a  frequency  versus  intensity  distribution  is  constructed  independently  for 
each  variable.  Instead  of  the  AUC,  we  used  the  area  between  the  curve  and  the  diagonal  (ACD) 
line.  Variables  displaying  a  relatively  high  ACD  were  retained  and  variables  with  relatively  low 
ACD  values  were  considered  as  noise  or  as  not  relevant  for  discrimination  purposes.  The 
operation  of  Part  1  produced  a  reduced  set  of  variables  that  provided  information  relevant  to  the 
samples.  In  Part  2,  a  series  of  interrogations  was  undertaken  in  w  hich  two  variables  at  a  time 
were  used  to  plot  their  (x,  y)  intensity  pairs  throughout  all  the  experiments  (cases).  A  vector  with 
a  given  angle  from  the  abscissa  was  drawn,  a  frequency  distribution  was  produced,  an  ROC 
curve  was  constructed  from  the  frequency  distribution,  and  the  ACD  was  noted.  The  angle  of  the 
vector  was  incremented,  and  a  new  ROC  curve  was  constructed.  This  occurred  in  increments 
over  360°.  The  angle  with  the  highest  ACD  value  and  its  vector  were  retained  for  those  two 
variables.  The  vector  for  the  first  two  variables  (VI  ,2)  became  the  next  independent  axis 
(abscissa),  and  the  third  variable  intensity  (V3)  formed  the  independent  axis  (ordinate).  The 
cases  resident  on  VI, 2  became  the  x  values  for  they  values  from  the  respective  cases  on  V3 
(variable  3  ordinate).  The  ROC  curve  analysis  was  continued  to  produce  a  VI -3  vector  (vectors 
1,  2,  and  3)  and  corresponding  angle  1-3  (angles  1,  2,  and  3).  The  process  was  repeated  for  every 
variable  and  resulted  in  (vector,  angle)  pairs  in  which  the  total  number  of  pairs  was  equal  to  the 
number  of  variables  minus  1 .  This  database  can  therefore  be  used  to  investigate  an  unknown  or 
target  experimental  output  such  as  the  probability  of  a  spectrum  belonging  to  a  sample  reference 
database. 


A  univariate  statistical  technique  has  been  created  that  combines  the  fundamental 
characteristics  of  multivariate  data  analysis  and  the  quantitative  information  of  an  ROC  curve. 


2.  THEORY 

Details  of  the  reduction  of  variables  are  presented  in  the  Discussion  section.  The 
mathematical  steps  of  the  multivariable,  multigroup  ROC  curve  approach  are  presented  as 
follows.  Appendix  A  contains  all  tables,  and  Appendix  B  contains  all  figures. 

2.1  Step  1 

A  point  plot  of  the  responses  for  the  first  two  variables  (VI  and  V2)  in  Table  1  is 
constructed.  A  vector  labeled  VI, 2  is  incrementally  rotated  between  0  and  360°  through  the  four- 
quadrant  point  plot.  At  each  increment,  the  angle  of  the  vector  (a)  is  noted  with  respect  to  the 
origin.  At  angular  increments,  a  frequency  distribution  of  the  points  is  made  with  respect  to  the 
vector  acting  as  the  abscissa.  The  first  vector  is  the  original  abscissa,  labeled  0°  rotation.  From 
the  frequency  distribution,  an  ROC  curve  is  constructed,7  15  and  the  ACD  is  calculated.  The 
vector  angle  is  incremented  to  a  new  a,  and  a  new  frequency  distribution  is  constructed  w'ith 
respect  to  that  vector’s  new  angle  placement.  An  ROC  curve  is  plotted  from  this  new  frequency 
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distribution,  and  the  ACD  is  noted.  This  occurs  for  every  angular  increment  about  the  origin. 
Equations  1  and  2  provide  the  mathematical  details: 

(V 1 ,2),  =  ( V 1 ,2  +  V2,2)°'5(cos(ao  -  a))  ( 1 ) 

where  ao  =  cos  '(VI, /(VI,2  +  V2 ,2)0'5);  /  =  1,  2,  3, ...,  m  (rows)  (2) 

2.2  Step  2 

The  a  and  vector  where  the  ACD  is  at  a  maximum  are  denoted  ai,2  and  VI, 2  for  that 
pair  of  variables: 


(VI  ,2),  =  (VI,2  +  V2,2)0  5(cos(a0  -  a,,2)) 


(3) 


2.3  Step  3 

Steps  1  and  2  are  repeated  using  the  new  combined  column  VI, 2  from  Step  2  with  the 
next  column  V3  (see  Table  1)  to  form  vector  VI -3. 

2.4  Step  4 

Steps  1  through  3  are  repeated  until  all  the  columns  are  combined. 

This  approach  reduces  the  original  data  matrix  (Table  1)  into  one  vector  that 
combines  the  data  from  the  four  variables  for  all  replicate  measurements.  The  approach  also 
identifies  the  maximum  delineation  and  probability  of  separation  between  tw  o  groups  of 
responses. 


3.  RESULTS  AND  DISCUSSION 

3.1  Reduction  of  Variables 

The  multivariate  ROC  curve  method  is  illustrated  using  the  classic  Fisher  data  set. 
This  data  set  consists  of  the  sepal  and  petal  widths  and  lengths  of  50  individual  flowers  from  Iris 
setosa,  I.  versicolor ,  and  I.  virginica.  The  three  iris  species  constitute  three  separate  groups,  and 
the  sepal  and  petal  widths  and  lengths  represent  four  distinct  variables.  The  Fisher  data  set  is 
commonly  used  as  a  primer  or  example  for  multivariate  data  analysis  presentations.  However, 
the  analysis  herein  follows  the  ROC  curve  procedure,  which  uses  the  response  of  one  variable  to 
discriminate  between  two  groups.  This  procedure  is  strengthened  by  the  systematic  adding  or 
“merging”  of  all  of  the  variable  responses.  Part  1  provides  an  analysis  for  arriving  at  the  fewest 
number  of  variables  containing  the  greatest  amount  of  information. 
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3.1.1 


I.  versicolor  and  1.  virginica 


Table  1  presents  the  original  experimental  data  collected  by  Fisher  and  includes  the 
lengths  and  widths  of  the  sepals  and  petals  of  the  three  iris  flower  species.  The  analysis  began  by 
establishing  the  integrity  of  the  responses  of  an  individual  variable  with  respect  to  an  ROC  curve 
determination.  The  sepal  lengths  of  I.  versicolor  and  /.  virginica  were  plotted  in  two  frequency 
distributions  for  an  ROC  curve  determination  on  the  degree  of  separation  between  the  two 
species.  Figure  1  provides  two  histograms  (frequency  distributions)  of  the  sepal  lengths  of  the 
I.  versicolor  (filled  circles)  and  /.  virginica  (triangles)  species  from  Table  1 .  There  were  a  total  of 
50  points  for  each  species;  therefore,  adding  the  filled  circle  or  triangle  ordinate  values  each 
yields  50. 


A  typical  ROC  curv  e  analysis  consists  of  moving  a  vertical  threshold  or  cutoff  line 
from  the  left  to  the  right  (as  in  Figure  1)  in  increments  (the  increment  here  was  3.3  mm  on  the 
abscissa).  In  Figure  1,  six  vertical  lines,  or  thresholds,  are  presented.  A  plot  of  the  ROC  curve  is 
simply  a  representation  of  what  percentage  or  fraction  of  each  group  resides  on  and  to  the  left  of 
a  selected  line.  Usually,  the  left-hand  distribution  is  chosen  as  the  starting  point  of  the  analysis. 
The  fraction  of  points  on  and  to  the  left  of  line  1  consisted  of  8%  (4/50  =  0.08)  /.  versicolor 
(filled  circles)  and  2%  (1/50  =  0.02)  I.  virginica  (triangles).  The  fraction  of  each  group  was 
determined  w  ith  respect  to  its  own  distribution.  Thus,  there  was  no  meaning  in  the  addition  of  the 
two  percentages  (pair  of  points,  top  of  Figure  1 ).  Lines  2-6  followed  suit  as  /.  versicolor ,  /. 
virginica :  42%,  6%;  78%,  38%;  98%,  76%;  100%,  88%;  and  100%,  100%.  The  six  points  were 
marked  in  Figure  2  and  provided  the  basis  for  the  ROC  curve.  The  /.  versicolor  and  /.  virginica 
species  were  plotted  as  fractions  on  the  ordinate  and  abscissa,  respectively.  The  analysis  was 
with  respect  to  the  left-hand  distribution;  therefore,  the  left-hand  distribution  took  the  label  of  TP 
on  the  ordinate.  The  right-hand  distribution  took  the  form  of  the  abscissa,  or  1  -  TN  =  FP.  The 
ROC  curve  was  obtained  as  in  Figure  2,  and  the  points  corresponding  to  each  threshold  in 
Figure  1  were  marked  accordingly.  The  ACD,  which  is  the  area  between  the  curve  and  the  45° 
diagonal  line  in  Figure  2,  could  be  calculated.  Considering  that  the  entire  square  space  enclosed 
by  0,  0;  0,  1;  1,  0;  and  1,  1  had  an  area  equal  to  1 .0,  the  diagonal  line  provided  two  regions  of 
0.5  each 

3.1.2  The  ACD 

The  ACD  is  characteristic  of  the  degree  of  separation  between  the  two  species 
(groups).  When  a  tested  variable  provides  essentially  no  discrimination  ability  betw  een  two 
groups,  a  Cartesian  plot  of  (1  -  TN,  TP)  yields  an  experimental  line  close  to  the  45°  line.  This 
occurs  when  the  dispersion  of  variable  responses  in  the  frequency  plot  for  the  two  groups 
overlaps  to  a  significant  extent;  the  ACD  approaches  zero.  When  a  variable  provides  a 
significant  degree  of  separation,  a  plot  of  (1  -  TN,  TP)  yields  an  exponential  curv  e  that  starts  at 
0,  0,  continues  nearly  straight  up  to  approach  0,  1,  and  then  is  almost  horizontal  to  1,  1.  The 
ACD  in  this  case  approaches  0.5,  and  this  signifies  a  high  degree  of  separation,  or  specificity, 
between  the  two  measured  groups.  This  occurs  when  the  frequency  distribution  of  variable 
responses  for  two  groups  has  a  high  degree  of  separation  with  very  little  to  no  overlap. 


In  Figure  2,  the  ACD  is  0.2896.  By  multiplying  the  ACD  by  200,  a  percent  degree  of 
separation  was  obtained.  Thus,  0.2896  x  200  =  57.9  or~58%.  Therefore,  the  sepal  length  has  a 
58%  probability  of  distinguishing  between  the  two  species  of  iris,  which  is  a  relatively  poor 
degree  of  separation.  This  ROC  curve  process  was  repeated  for  the  sepal  width  between  the  two 
iris  species.  An  ACD  of  0.1636  was  obtained,  and  the  percent  separation  was  equal  to  32.7%. 
Figures  3  and  4  present  the  frequency  distribution  plot  and  the  ROC  curve,  respectively.  The 
ROC  curve  was  closer  to  the  diagonal  line  in  Figure  4  as  compared  to  Figure  2.  The  sepal  width 
displayed  approximately  half  the  degree  of  separation  (32.7%)  with  respect  to  the  sepal  length 
(58%)  for  the  Fisher  data  set. 

The  analysis  was  repeated  for  the  petal  lengths  and  widths  for  the  two  species. 

Table  2  presents  the  ACD  and  percent  separation  for  the  analysis  of  the  four  variables.  Figure  5 
is  a  plot  of  the  four  ACD  values  in  Table  2.  Columns  1—4  in  Figure  5  represent  the  sepal  length, 
sepal  width,  petal  length,  and  petal  width,  respectively,  for  each  of  the  three  species  in  Table  1. 
Table  2  and  Figure  5  provide  significant  information,  because  the  results  show  that  in  Fisher’s 
original  data,  the  sepal  contribution  did  not  provide  as  good  discrimination  ability  as  the  petal 
information.  This  result  was  attained  by  use  of  simple  univariate  ROC  curv  e  statistics  as 
compared  to  the  classical  multivariate  data  analysis  treatment.1  4 

Both  the  petal  length  and  width  allowed  for  a  96%  degree  of  separation  between  the 
dispersion  of  the  two  measurements.  Therefore,  the  original  four  variables  may  be  reduced  to 
only  one  variable,  i.e.,  either  petal  length  or  petal  width,  for  a  satisfactory  degree  of  separation 
between  the  two  iris  species. 

3.1.3  /.  setosa  and  I.  virginica 

The  above  process  was  repeated  in  a  comparison  of  the  four  iris  variables  betw  een  the 
I.  setosa  and  I.  virginica  species.  Table  3  provides  the  reduced  data  set,  and  Figure  6  is  a  plot  of 
the  ACD  values  from  Table  3.  Note  that  petal  lengths  and  widths,  separately,  provided  a  100% 
degree  of  separation,  and  both  produced  an  ideal  ROC  curve  with  an  ACD  of  0.5.  This  result 
signifies  that  multivariate  analysis  is  not  necessary  when  a  simple  univariate  ROC  curve  analysis 
provides  a  satisfactory  level  of  discrimination.  F  igure  7,  A-D  includes  frequency  distribution 
plots  of  the  four  variables  for  the  I.  setosa  (open  circles)  and  /.  virginica  (triangles)  species  with 
information  for  the  ROC  curves  shown  in  the  figure  insets.  The  petal  information  provided  a 
1 00%  degree  of  separation  for  the  two  species,  whereas  the  sepal  length  data  produced  a  very 
high  degree  of  separation  (96.9%).  Sepal  width  information  resulted  in  a  relatively  poor  degree 
of  separation  (66.8%). 

3.1.4  /.  setosa  and  I.  versicolor 

Univariate  iteration  was  performed  for  a  comparison  of  the  /.  setosa  (open  circles) 
and  I.  versicolor  (triangles)  species  as  shown  in  Figure  8,  A-E  and  Table  4.  Data  reduction  and 
analysis  were  performed  in  a  similar  fashion  as  shown  in  Figures  6  and  7,  A-D  and  Tables  2  and 
3.  Again,  the  petal  width  and  length  variables  achieved  a  100%  separation  between  the  two 
species.  For  the  entire  data  set,  there  was  no  need  for  further  analysis,  because  the  three  iris 
species  could  be  distinguished  from  one  another  with  a  96  to  100%  degree  of  separation  where 
only  one  variable  was  necessary  for  the  distribution  of  any  two  groups. 
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3.1.5 


Iris  Flower  Analysis 


The  above  data  analysis  essentially  identifies  variables  that  contribute  a  significant 
amount  of  discriminating  information.  Generally,  experimental  data  consist  of  many  variables 
(hundreds  to  thousands)  such  as  spectroscopy  (wavenumber  or  wavelength)  and  spectrometry 
(m/z  or  ion  mobility  drift  time)  spectra.  It  is  these  types  of  data  sets  that  would  most  benefit  by  a 
reduction  of  variables.  The  variable  reduction  also  identifies  the  least  discriminating  and  noisy 
variables  that  need  not  be  considered  in  subsequent  analyses.  Also,  further  analysis  is  necessary 
when  a  reduction  of  variables  procedure  provides  multiple  variables  that  yield  less  than 
satisfactory  discrimination  capability. 

3.2  Data  Integration 

3.2.1  Variable  Processing 

The  iris  data  set  did  not  require  further  analysis.  However,  to  show  the  strengths  of 
the  data  analysis  concept  presented  herein,  all  four  variables  were  considered  for  further 
processing,  and  the  resulting  three  different  sets  of  species  analyses  were  analyzed  and 
compared. 

3.2.2  Data  Space  Point  Rotation 

Part  2  of  the  data  analysis  consisted  of  merging  the  variables  in  a  systematic  fashion 
with  an  ROC  curve  analysis  at  each  variable  inclusion.  This  accounting  of  the  variables  took 
place  with  two  groups  of  experiments  at  a  time.  In  the  case  of  the  iris  data,  analysis  of  three 
species  required  three  sets  of  two  groups:  /.  setosa ,  I.  versicolor ;  I.  setosa,  I.  virginica;  and 
I.  versicolor ,  I.  virginica.  Each  group  was  treated  separately  w  ith  all  four  variables  in  a 
systematic  procedure  that  used  univariate  statistics.  The  /.  versicolor ,  I.  virginica  pair  of  species 
was  addressed  first.  The  figure  symbols  are  as  presented  above. 

3.2.2. 1  Sepal  Variables,  VI, 2 

Instead  of  a  frequency  plot  of  distance  for  only  one  variable,  an  (jc,  v)  pair  of  axes  was 
constructed  in  w  hich  the  abscissa  w  as  that  of  variable  1  (VI,  sepal  length),  and  the  ordinate  was 
that  of  variable  2  (V2,  sepal  width).  Figure  9  A  shows  the  point  plot  for  the  I.  versicolor  and  7. 
virginica  species.  Fifty  points  were  plotted  for  each  species.  A  series  of  steps  was  performed  in 
either  of  two  equivalent  ways,  and  one  of  the  procedures  is  presented  in  a  comprehensive 
fashion. 


The  data  space  of  points  was  uniformly  rotated  in  10°  increments,  and  an  analysis 
was  performed  at  each  angle  increment  including  an  ROC  curve  ACD  determination.  At  the 
start,  no  rotation  was  necessary,  and  this  was  labeled  as  a  0°  rotation  (Figure  9A).  A 
perpendicular  line  was  drawn  to  the  abscissa  from  all  100  points.  This  was  the  same  as  the 
original  jc  axis  or  VI  axis,  where  VI  is  a  label  for  the  first  variable  vector  or  sepal  length. 
Intensity  bins  were  formed  on  the  jc  axis,  and  the  number  of  filled  circle  and  triangle  points  in 
each  bin  were  summed  and  plotted  in  Figure  9B  as  a  frequency  distribution.  The  two  sets  of 
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points  yielded  two  distribution  curves.  An  ROC  curve  analysis  was  performed  on  the  two 
distributions  as  shown  in  Figure  9C  to  calculate  an  ACD.  The  abscissa,  which  consisted  of  the 
placements  of  the  sepal  length  of  all  100  points,  the  0°  rotation  value,  and  the  ACD,  was  then 
saved. 


The  100  points  were  then  uniformly  rotated  10°  (Figure  9D).  A  perpendicular  line 
from  all  100  points  was  drawn  to  the  x  axis;  this  can  be  considered  a  modified  x  axis  from  the 
original  abscissa.  Intensity  bins  were  formed  on  this  new  x  axis,  and  the  numbers  of  filled  circle 
and  triangle  points  were  counted  and  plotted  (Figure  9E)  as  a  frequency  distribution.  The  two 
sets  of  points  formed  two  distribution  curves  as  in  Figure  9E,  and  the  ROC  curve  is  shown  in 
Figure  9F.  The  ACD  is  noted  with  the  10°  modified  x  axis,  with  the  placement  of  all  100  points 
on  that  axis. 

These  steps  were  repeated  at  every  10°  increment  of  data  space  point  rotation.  An 
ROC  curve  ACD  was  derived  for  each  pair  of  distributions  of  the  sepal  distances  at  each  rotation 
increment.  Figure  9G  shows  a  330°  rotation  of  the  points,  which  produced  the  maximum  ACD, 
and  Figure  9H  shows  the  distribution  of  both  sets  of  points  for  the  I.  versicolor  and  /.  virginica 
species  from  Figure  9G.  Figure  91  is  the  ROC  curve  for  the  data  in  Figure  9G. 

Figure  10A  is  a  plot  of  a  select  set  of  the  ROC  curves  at  different  data  space  rotations. 
Note  that  there  are  ROC  curves  that  lie  below  the  45°  line.  Between  0°  and  1 80°,  the  1  -  TN  and 
TP  values  were  plotted  with  respect  to  the  I.  versicolor  points  lying  to  the  left  of  the  I.  virginica 
species  in  a  V1-V2  plot.  Between  180°  and  360°,  the  points  were  rotated  such  that  the  I. 
versicolor  points  were  to  the  right  of  the  I.  virginica  species.  This  produced  ROC  curves  below 
the  diagonal  line.  Figure  1 0B  is  a  plot  of  the  rotation  angle  versus  its  respective  ROC  curve  ACD 
value.  The  330°  rotation  provided  the  maximum  ACD  value  of  0.29.  An  ACD  of  0.29  is 
equivalent  to  a  58%  degree  of  separation  for  the  two  iris  species  using  both  sepal  dimensions.  An 
ACD  of  0.29  is  not  a  satisfactory  degree  of  separation;  rather,  an  ACD  close  to  0.5  is  desired. 
Therefore,  it  was  necessary  to  consider  the  next  variable  for  inclusion  and  merging  to  possibly 
improve  the  ACD  experimental  value. 

3. 2. 2. 2  Sepal  Variables  and  Petal  Length,  Vl-3 

The  vector  generated  by  the  angle  at  330°  was  labeled  VI, 2,  because  it  provided  the 
best  separation  between  the  two  distributions  with  respect  to  the  ACD  value  with  the  sepal 
dimensions.  VI, 2  became  the  new  abscissa,  and  the  ordinate  represented  the  vector  for  the  third 
variable,  i.e.,  petal  length  or  V3.  Figure  1 1 A  presents  a  data  space  with  the  abscissa  and  ordinate 
as  VI, 2  and  V3,  respectively.  All  100  points  were  plotted  in  the  data  space  accordingly.  Note 
that  each  of  the  100  points  had  itsx  value  on  the  VI, 2  axis.  All  steps  and  procedures  were 
performed  with  a  10°  rotation  increment  along  with  an  ROC  curve  analysis  for  the  I.  versicolor 
and  7.  virginica  species  data.  Figure  1  IB  shows  the  point  plot  at  a  250°  rotation  of  the  data 
points,  which  yielded  a  maximum  ACD  value,  and  Figure  1 1C  shows  an  overlay  of  the  two 
frequency  distribution  curves.  Figure  1  ID  presents  an  ROC  curve  analysis,  and  Figure  1  IE  is  a 
plot  that  shows  the  angle  of  rotation  versus  the  ACD.  Note  that  the  maximum  ACD  of  0.49 
occurred  at  a  250°  angle.  An  ACD  of  0.49  is  equivalent  to  a  98%  degree  of  separation  for  the 
two  iris  species.  Therefore,  the  petal  length  (V3)  provides  a  major  source  of  differentiation 
compared  to  both  sepal  variables. 
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3. 2. 2. 3  All  Four  Iris  Variables,  VI ~4 


The  vector  at  250°  was  referred  to  as  VI -3,  and  it  contained  information  from  vectors 
1 , 2,  and  3.  This  vector  became  the  new  abscissa,  and  variable  4  or  petal  width  became  V4  on  the 
ordinate.  All  100  points  were  plotted  in  the  data  space  (Figure  12A).  This  representation 
translated  a  four-dimensional  data  set  (four  variables  or  vectors)  into  a  two-dimensional  data 
space  without  loss  of  the  inherent  data  set  characteristics  and  trends.  Figure  12B  shows  a  330° 
rotation  of  the  dispersion  of  points,  and  Figure  12C  presents  an  overlay  of  the  two  frequency 
distribution  curves.  A  rotation  angle  of  330°  provided  the  highest  ACD  upon  an  ROC  curve 
analysis  (Figure  12D)  of  all  rotation  angles.  An  ACD  of  0.4998  was  obtained,  which  corresponds 
to  a  99.6%  degree  of  separation  between  the  tw  o  iris  species. 

3. 2. 2. 4  Petal  Variables,  V3,4 

This  analysis  leads  to  the  question  regarding  whether  all  four  variables  are  necessary 
to  produce  a  99.6%  degree  of  separation.  To  answer  this  question,  the  data  set  was  analyzed 
using  only  petal  length  (V3)  and  petal  width  (V4).  This  reduced  the  problem  to  two  variables  and 
two  groups,  which  is  still  beyond  the  standard  ROC  curv  e  analysis  of  only  one  variable 
distinguishing  between  two  groups.  Generally,  the  more  information  that  can  be  applied  to  a 
statistical  problem,  the  greater  the  degree  that  a  satisfactory  resolution  can  be  achieved.  Note  that 
this  new  statistical  technique  allows  for  many  variables  to  be  introduced  in  an  analysis  of  two 
groups  for  qualitative  as  well  as  ROC  curve  quantitative  and  selectivity  information.  As  more 
variables  are  considered,  a  larger  ACD  and  hence  better  discrimination  between  the  tw  o  groups 
is  achieved.  The  frequency  distributions  at  maximum  ACD  values  versus  the  numbers  of 
variables  considered  can  be  compared  in  Figures  9H  (Vl-2),  1 1C  (Vl-3),  and  12C  (Vl-4).  The 
respective  ACD  (percent  separation)  values  were  0.29  (58%),  0.49  (98%),  and  0.498  (99.96%). 

Figure  13A  is  a  plot  of  petal  length  (V3)  versus  petal  width  (V4)  for  the  1.  versicolor 
(filled  circles)  and  I.  virginica  (triangles)  species.  Figure  13B  presents  the  two  frequency 
distributions  of  the  points  in  Figure  13A,  and  Figure  13C  presents  the  ROC  curve.  Figure  13D 
shows  the  angle  versus  the  ACD  plot  for  the  iris  petal  information;  note  that  at  a  rotation  of  280°, 
the  maximum  ACD  value  was  achieved.  Figure  14A  shows  the  plot  where  the  points  in 
Figure  13A  were  rotated  280°,  and  Figure  14B  presents  its  frequency  distribution.  Figure  14C 
shows  the  ROC  curve  for  the  280°  rotation  of  points.  Using  only  the  V3  and  V4  variables 
produced  an  ACD  of  0.495  (99%  separation)  compared  to  using  all  four  variables,  which  yielded 
an  ACD  of  0.4998  or  99.6%  separation  (Figure  12,  C  and  D).  The  difference  in  separation 
efficiency  of  the  two  species  was  negligible. 

The  entire  process  can  be  repeated  for  discrimination  purposes  for  any  two  groups  or 
cases  consisting  of  any  number  of  variables.  This  process  can  be  repeated  for  the  I.  setosa  and 
I.  virginica  species.  However,  the  analysis  in  Part  1  established  that  the  petal  lengths  and  widths 
were  sufficient  to  provide  a  100%  separation  (Table  3).  This  was  also  true  for  the  separation  of 
the  I.  setosa  and  I.  versicolor  species  (Table  4).  These  qualitative  and  quantitative  discrimination 
analyses  were  accomplished  w  ithout  the  use  of  matrix  algebra  as  required  by  multivariate  data 
analysis  methods. 
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3.3 


Alternate  Method  of  Analysis 


The  above  analysis  has  an  equivalent  procedure  for  attaining  the  same  results.  Instead 
of  rotating  the  entire  data  set  of  points  at  every  angle  increment,  a  vector  is  drawn  at  10°  from  the 
x  axis.  The  points  remain  in  their  positions,  and  a  perpendicular  line  is  drawn  from  the  100  points 
onto  the  10°  vector.  The  vector  undergoes  a  frequency  distribution  analysis  and  results  in  two 
frequency  distributions.  An  ROC  curve  analysis  takes  place,  and  the  vector,  angle,  and  ACD  are 
stored.  The  vector  is  then  placed  20°  from  the  x  axis,  and  perpendicular  lines  are  drawn  to  that 
new  vector  to  denote  the  placement  of  the  100  points  on  that  vector.  This  procedure  provides  the 
same  results  as  the  rotation  of  data  space  points. 


4.  CONCLUSIONS 

A  univariate  statistical  method  was  presented  to  collect,  reduce,  and  analyze  a 
multivariable  response  for  replicates  of  more  than  two  cases  or  groups.  Conventional  ROC  curve 
analysis  is  the  backbone  for  the  method.  The  new  statistical  univariate  data  analysis  method 
herein  provides  an  ROC  curve  analysis  with  the  ability  to  incorporate  more  than  one  variable  and 
more  than  two  groups  in  the  analysis  and  conclusions  for  qualitative  differentiation  and 
selectivity  purposes.  The  raw  data  remain  in  its  inherent  trend  and  nature.  No  data  set 
normalization  or  scaling  procedures  are  necessary.  The  mean  and  standard  deviation,  which  are 
fundamental  values  for  multivariate  data  analysis,  are  not  needed.  By  nature,  the  ROC  curve 
procedures  can  be  applied  to  any  kind  of  data  distribution  in  addition  to  the  classic  Gaussian 
trend,  such  as  step  functions  and  skewed  distributions,  whether  they  are  linear  or  nonlinear. 
Multivariate  data  analysis,  on  the  other  hand,  necessarily  forces  a  data  set  into  a  linear  model, 
because  the  algorithm  relies  on  a  linear  combination  of  variables.  The  ROC  curve  method 
presented  here  has  the  ability  to  translate  a  multivariable  (or  multidimension  or  multivector)  data 
set  into  a  one- variable  or  one-dimensional  response  analysis  while  preserving  the  inherent  nature 
of  the  distribution,  whether  it  is  linear  or  nonlinear. 

ROC  curve  analyses  yield  results  that  determine  which  variables  are  best  used  for  the 
critical  decision-making  process  that  distinguishes  two  experimental  groups.  The  variables 
themselves  dictate  which  of  the  entire  data  set  will  form  subsets,  and  this  provides  groupings  of 
experimental  cases  (spectral  points  in  data  space).  The  measurement  vehicle  for  this 
determination  is  the  ACD.  For  a  multivariate  dendrogram  analysis,  it  is  the  distance  between 
each  case  that  determines  which  cases  form  the  subgroups  and  the  relative  separation  of  each 
subgroup. 


The  new  algorithm  using  ROC  curve  techniques  produces  a  “master”  vector.  The 
master  vector  (reference  or  library  vector)  is  a  systematic  integration  of  the  chosen  set  of 
variables.  The  master  vector  can  be  used  in  a  practical  situation  to  identify  an  unknown  sample 
(case),  and  it  will  be  presented  elsewhere.  This  method  also  preserves  the  TP,  FN,  and  FP 
probability  quantitative  information  of  a  data  set  with  multiple  species  of  interest. 

Combining  qualitative  and  quantitative  aspects  of  data  analysis  into  a  univariate 
statistical  method  provides  advantages  in  terms  of  algorithm  understanding  for  the  layman, 
computer  efficiency,  and  information-rich  analysis. 
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GLOSSARY 


ACD  area  between  the  curve  and  the  diagonal 

AUC  area  under  the  curve 

FN  false  negative 

FP  false  positive 

t nh  mass  to  charge  ratio 

ROC  receiver  operating  characteristics 

TN  true  negative 

TP  true  positive 

VI  vector  (variable)  1 

V 1 ,2  statistical  combination  of  vector  (variable)  1  and  vector  (variable)  2 
VI -3  statistical  combination  of  vectors  1,  2,  and  3 
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APPENDIX  A 
TABLES 


Table  1.  Fisher  data  set.  The  set  consisted  of  three  species  of  iris  flowers.  Replicate 
measurements  of  four  variables  included  sepal  and  petal  lengths  and  widths.  Data  are  in 
millimeters. 
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Table  1,  continued. 
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Table  2.  ROC  ACD  and  percent  separation  between  the  I.  versicolor  and  I.  virginica  species 
for  sepal  and  petal  widths  and  lengths. 


Variable 

ACD 

Separation,  % 

1  Sepal  length 

0.2896 

57.9 

2  Sepal  width 

0.1636 

32.7 

3  Petal  length 

0.4822 

96.4 

4  Petal  width 

0.4804 

96.1 

Table  3.  ROC  ACD  and  percent  separation  between  the  /.  setosa  and  I.  virginica  species 
for  the  sepal  and  petal  widths  and  lengths. 


Variable 

ACD 

Separation,  % 

1  Sepal  length 

0.4846 

96.9 

2  Sepal  width 

-0.3342 

66.8 

3  Petal  length 

0.5000 

100 

4  Petal  width 

0.5000 

100 

Table  4.  ROC  ACD  and  percent  separation  between  the  I.  setosa  and  I.  versicolor  species 
for  the  sepal  and  petal  widths  and  lengths. 


Variable 

ACD 

Separation,  % 

1  Sepal  length 

0.432600 

86.5 

2  Sepal  width 

-0.424600 

84.9 

3  Petal  length 

0.500000 

100 

4  Petal  width 

0.500000 

100 
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Figure  1 .  Frequency  plots  of  the  sepal  lengths  of  50  separate  /.  versicolor  (filled  circles)  and 
50  I.  virginica  (triangles)  flowers.  Data  are  from  the  Fisher  data  set.  " 


ROC  Curve 

Versicolor  vs.  Virginica  (Sepal  length) 


Figure  2.  ROC  curv  e  of  the  Figure  1  data  set. 


23 


Versicolor  vs  Virginica 
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Figure  3.  Frequency  plots  of  the  sepal  widths  of  50  separate  I.  versicolor  (filled  circles)  and 
50  I.  virginica  (triangles)  flowers.  Data  are  from  the  Fisher  data  set.2'1 


ROC  Curve 

Versicolor  vs.  Virginica  (Sepal  width) 


Figure  4.  ROC  curve  of  the  data  shown  in  Figure  3. 
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Versicolor  vs.  Virginica 
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Variable  Number 

Figure  5.  Histogram  of  the  data  in  Table  2.  Columns  1-4  represent  sepal  length,  sepal  width, 
petal  length,  and  petal  width,  respectively. 


Setosa  vs.  Virginica 
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Figure  6.  Histogram  of  the  data  in  Table  3.  Columns  1-4  represent  sepal  length,  sepal  width, 
petal  length,  and  petal  width,  respectively. 
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Setosa  vs  Virginica 


Figure  7A.  I.  setosa  (open  circles)  and  I.  virginica  (triangles)  frequency  distribution  plots 
for  sepal  length  for  50  replicates  of  each  species.  Inset  displays  ROC  curv  e  for  the  analysis. 
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Figure  7B.  /.  setosa  (open  circles)  and  /.  virginica  (triangles)  frequency  distribution  plots 
for  sepal  width  for  50  replicates  of  each  species.  Inset  displays  ROC  curve  for  the  analysis. 
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Figure  7C.  /.  setosa  (open  circles)  and  I.  virginica  (triangles)  frequency  distribution  plots 
for  petal  length  for  50  replicates  of  each  species.  Inset  displays  ROC  curve  for  the  analysis. 


Figure  7D.  I.  setosa  (open  circles)  and  I.  virginica  (triangles)  frequency  distribution  plots 
for  petal  width  for  50  replicates  of  each  species.  Inset  displays  ROC  curve  for  the  analysis. 
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Setosa  vs.  Versicolor 
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Figure  8A.  7.  setosa  (open  circles)  and  7.  versicolor  (closed  circles)  frequency  distribution  plot 
for  sepal  length  for  50  replicates  of  each  species.  Inset  displays  ROC  curv  e  for  the  analysis. 
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Figure  8B.  /.  setosa  (open  circles)  and  7.  versicolor  (closed  circles)  frequency  distribution  plot 
for  sepal  width  for  50  replicates  of  each  species.  Inset  displays  ROC  curve  for  the  analysis. 
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Figure  8C.  I.  setosa  (open  circles)  and  I.  versicolor  (closed  circles)  frequency  distribution  plot 
for  petal  length  for  50  replicates  of  each  species.  Inset  displays  ROC  curve  for  the  analysis. 


Figure  8D.  I.  setosa  (open  circles)  and  I.  versicolor  (closed  circles)  frequency  distribution  plots 
for  petal  width  for  50  replicates  of  each  species.  Inset  displays  ROC  curv  e  for  the  analysis. 
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Setosa  vs.  Versicolor 
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Figure  8E.  Histogram  of  the  iris  measurement  variables  versus  ACD  data  in  Table  4. 
Columns  1-4  represent  sepal  length,  sepal  width,  petal  length,  and  petal  width,  respectively. 


Versicolor  vs.  Virginica  @  0  degree  angle 


Figure  9A.  Sepal  length  (VI)  versus  sepal  width  (V2)  point  plot  distribution  of  the  50  replicates 
for  both  I.  versicolor  (filled  circles)  and  I.  virginica  (triangles)  at  a  10°  rotation  of  the  points. 
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Figure  9B.  Frequency  distribution  of  points  shown  in  Figure  9A. 


Versicolor  vs.  Virginica  @  0  degree  angle 


1 -Specificity  (1-TN) 

Figure  9C.  ROC  curve  of  the  frequency  distribution  shown  in  Figure  9B. 
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Versicolor  vs.  Virginica  @10  degree  angle 


vi 


Figure  9D.  Sepal  length  (VI)  versus  sepal  width  (V2)  point  plot  distribution  of  the  50  replicates 
for  both  I.  versicolor  (filled  circles)  and  /.  virginica  (triangles)  at  a  10°  rotation  of  the  points. 


Figure  9E.  Frequency  distribution  of  points  shown  in  Figure  9D. 
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Versicolor  vs.  Virginica  @10  degree  angle 


Figure  9F.  ROC  curve  of  the  frequency  distribution  shown  in  Figure  9E. 
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Figure  9G.  Sepal  length  (VI)  versus  sepal  width  (V2)  point  plot  distribution  of  the  50  replicates 
for  both  I.  versicolor  (filled  circles)  and  I.  virginica  (triangles)  at  a  10°  rotation  of  the  points. 
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Versicolor  vs.  Virginica  (5)  330  degree  angle 
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Figure  9H.  Frequency  distribution  of  points  shown  in  Figure  9G. 


Versicolor  vs.  Virginica  @  330  degree  angle 
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Figure  91.  ROC  curve  of  the  frequency  distribution  in  Figure  9G. 
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Figure  10A.  Plot  of  the  ROC  curves  between  0°  and  360°  rotation  of  the  point  distribution 
in  Figure  9D  in  20°  increments. 


Figure  10B.  Plot  of  angle  of  rotation  of  points  in  Figure  9D  versus  ACD. 
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Versicolor  vs.  Virginica  @  0  degree  angle 


Figure  1 1  A.  Sepal  length,  sepal  width  (VI, 2)  versus  petal  length  (V3)  point  plot  distribution 
of  the  50  replicates  for  both  I.  versicolor  (filled  circles)  and  I.  virginica  (triangles)  with  no 
rotation  of  the  points. 


Versicolor  vs.  Virginica  @  250  degree  angle 


Figure  1 1 B.  Sepal  length,  sepal  width  (VI, 2)  versus  petal  length  (V3)  point  plot  distribution 
of  the  50  replicates  for  both  I.  versicolor  (filled  circles)  and  I.  virginica  (triangles)  at  a 
250°  rotation. 
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Versicolor  vs.  Virginica  @  250  degree  angle 


Figure  1 1C.  Frequency  distribution  of  the  data  in  Figure  1  IB. 


Versicolor  vs.  Virginica  ,  VI-3  @  250  degree  angle 


Figure  1 1 D.  ROC  curve  plot  of  the  frequency  distribution  in  Figure  1 1C. 
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ACD  vs.  Rotated  Angle  (VI -3) 


Figure  1  IE.  Plot  of  angle  of  rotation  of  points  versus  ACD  (E). 


Versicolor  vs.  Virginica  @  0  degree  angle 


VI -3 

Figure  12A.  Sepal  length,  sepal  width,  petal  length  (VI -3)  versus  petal  width  (V4)  point  plot 
distribution  of  the  50  replicates  for  both  I.  versicolor  (filled  circles)  and  I.  virginica  (triangles) 
with  no  rotation  of  the  points. 
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Versicolor  vs.  Virginica  @  330  degree  angle 
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Figure  12B.  Sepal  length,  sepal  width,  petal  length  (Vl-3)  versus  petal  width  (V4)  point  plot 
distribution  of  the  50  replicates  for  both  /.  versicolor  (filled  circles)  and  /.  virginica  (triangles) 
at  a  330°  rotation  of  the  points. 
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Versicolor  vs.  Virginica  @  330  degree  angle 
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Figure  12C.  Frequency  distribution  at  a  330°  rotation  of  points. 
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Versicolor  vs.  Virginica  @  330  degree  angle 


Figure  12D.  ROC  curv  e  plot  of  the  data  shown  in  Figure  12C. 
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Figure  13  A.  Petal  length  (V3)  versus  petal  width  (V4)  point  plot  distribution  of  the  50  replicates 
for  both  I.  versicolor  (filled  circles)  and  /.  virginica  (triangles)  with  no  rotation  of  the  points. 
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Versicolor  vs.  Virginica  @  0  degree  angle 


Figure  13B.  Petal  length  (V3)  versus  petal  width  (V4)  for  both  /.  versicolor  (filled  circles)  and 
I.  virginica  (triangles):  frequency  distribution  at  0°  rotation  of  the  points. 


Versicolor  vs.  Virginica  @  0  degree  angle 


0  0,2  0,4  06  0  6  1 


1 -Specificity  (1-TN) 

Figure  13C.  ROC  curve  of  the  data  shown  in  Figure  13B. 
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ACD  vs  Rotated  Angle  (V3,4) 


Figure  13D.  Plot  of  angle  of  rotation  of  points  shown  in  Figure  13  A  versus  ACD. 


Versicolor  vs.  Virginica  @  280  degree  angle 


Figure  14A.  Petal  length  (V3)  versus  petal  width  (V4)  point  plot  distribution  of  the  50  replicates 
for  both  I.  versicolor  (filled  circles)  and  I.  virginica  (triangles)  at  280°  rotation  of  the  points. 
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Versicolor  vs.  Virginica  @  280  degree  angle 


V3,4 

Figure  14B.  Petal  length  (V3)  versus  petal  width  (V4)  for  both  /.  versicolor  (filled  circles)  and 
I.  virginica  (triangles):  frequency  distribution  at  280°  rotation  of  points. 

Versicolor  vs.  Virginica  @  280  degree  angle 
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Figure  14C.  ROC  curve  plot  of  the  data  shown  in  Figure  14B. 
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